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Abstract. A well known conjecture of Wigner, Dyson, and Mehta asserts 
that the (appropriately normalized) fc-point correlation functions of the eigen- 
values of random n X n Wigner matrices in the bulk of the spectrum converge 
(in various senses) to the fc-point correlation function of the Dyson sine pro- 
cess in the asymptotic limit n — > oo. There has been much recent progress on 
this conjecture; in particular, it has been established under a wide variety of 
decay, regularity, and moment hypotheses on the underlying atom distribution 
of the Wigner ensemble, and using various notions of convergence. Building 
upon these previous results, we establish new instances of this conjecture with 
weaker hypotheses on the atom distribution and stronger notions of conver- 
gence. In particular, assuming only a finite moment condition on the atom 
distribution, we can obtain convergence in the vague sense, and assuming an 
additional regularity condition, we can upgrade this convergence to locally L 1 
convergence. 

As an application, we determine the limiting distribution of the number 
of eigenvalues Nj in a short interval / of length 0(l/?i). As a corollary of 
this result, we obtain an extension of a result of Jimbo et. al. concerning the 
behavior of spacing in the bulk. 



1. Introduction 

1.1. Correlation functions. This paper is concerned with the phenomenon of 
bulk universality for the eigenvalue distribution of random Wigner ensembles. To 
explain this phenomenon we need some notation. Given a random Hermitian nx n 
matrix M n , we can form the n real eigenvalues (counting multiplicity), which we 
order as 

Ai (M n ) < < A n (M„). 

These n random real variables can be viewed as describing a point process o~(M n ) := 
{Ai(M„), . . . , A n (M„)}. Associated to this point process, we can define the (un- 
normalized) k-point correlation functions R n k ^ : R fc — > R + , which we can define by 
duality as the unique symmetric measurable function (or distribution) on R fe with 
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the property that 
(1) 

^ F(x u . . . , x k )R^ (x u . . . , x k ) dan... dx k = fc!E ^ F(X n (M n ), ...,X ih (M„)) 

R l<ll<...<ifc<« 

for any symmetric continuous, compactly supported function F : R fc — > R. In 
particular, with this normalization, we have 

R^\xi,. . .,x k ) dxx ...dx k = — . 

, Rk (n-ky. 

The L 1 -normalized fc-point correlation functions pn : = (n^pl are a i so use d in 
the literature, but we will stick with the above normalization. 

If M n is a discrete random matrix ensemble, then Rn can only be defined in 
the sense of distributions. But if M n is a continuous random matrix ensemble 
(with a continuous probability density function), then R^ becomes a continuous 
symmetric function that vanishes whenever the x\, . . . ,x k are not all distinct, and 
can be defined more explicitly for distinct reals xi, . . . ,x k by the formula 

r£\x u ...,x k ) = lim ^P(£ £lXl ,... )X J 

where E e ^ Xl ^.^ Xk is the event that there is an eigenvalue of M n in the interval 
[xi 7 Xi + e] for each 1 < i < k. Thus, for instance, the joint probability distribution 
of the ordered eigenvalues Ai < . . . < A„ is given by the probability measure 

n!p„(Ai, . . . , A„)lA 1 <...<A„rfAi . . . d\ n , 

where p n := A-Rn , and the remaining correlation functions Rn can be derived 
from the probability density function p n by the integration formula 



R^(xi, . . . ,Xk) = 7 — -^-prr I p n (xi, . . . , x n ) dx k+ i . . . dx n . 
(n - k)\ J Rn - k 

The 1-point correlation function controls the density of states; indeed, for 
continuous random ensembles at least, one has the formula 

EAT/ = J R£\x) dx 

for any interval /, where iVj is the number of eigenvalues in I. 



2. WlGNER MATRICES AND THE SEMICIRCLE LAW 



We now restrict attention to a specific class of random matrix ensembles, namely 
the Wigner ensembles. 

Definition 1 (Wigner matrices). Let £, £ be real random variables with mean 
zero and variance 1, and let n > 1 be an integer. An n x n Wigner Hermitian 
matrix M n with atom distributions £, £ is defined to be a random Hermitian n x n 
matrix M n with upper triangular complex entries -j^dj '■= T^TTffej + \/~ 17 \j) 
(1 < i < j < n) and diagonal real entries -7=Cu (1 < i < n ) where the £ij,TijXu 
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are jointly independent random variables, with £y,T.y having the distribution of £ 
for 1 < i < j < n, and Qi having the distribution of £ for 1 < i < n. 

Example 2. The famous Gaussian Unitary Ensemble (GUE) is the special case 
of the Wigner ensemble in which the atom distributions £, £ are gaussian random 
variables, £, £ = iV(0, 1). At the opposite extreme, the complex Bernoulli ensemble 
is an example of a discrete Wigner ensemble in which the atom distributions £, £ 
equal +1 with probability 1/2 and —1 with probability 1/2. 

For these matrices, the bulk distribution of the eigenvalues is governed by the 
semicircular distribution p sc (x) dx, where p sc : R > R is the function 

Psc (x) :=i-(4- 2 ; 2 )f . 

Indeed, we have 

Theorem 3 (Wigner semicircular law). Let M n be a Wigner Hermitian matrix 
with fixed atom distributions (independent ofn). Then the normalized 1-point 
correlation function x i— > —Rn\x) converges weakly to p sc (x) in the limit n — > oo, 
thus 

-R^(x)F(x) dx^ p sc {x)F{x) dx 

R n JR 

for any continuous, compactly supported function F : R > R. 

Proof. Sec [28]. □ 

The semicircular law suggests that at any bulk energy level — 2 < u < 2, the mean 
eigenvalue spacing of the eigenvalues A,(M n ) in the vicinity of u should be close 
to np r u \ ■ As such, it is natural to introduce the normalized k-point correlation 

functions pn} u ■ R fc — > R + for 1 < fc < n, localized to the energy level u, by the 
formula 

< 2 > '•-"> : = (" + isfe » + ^55 ) ■ 

(Again, when M n is a discrete ensemble, the p^ u should be interpreted as distri- 
butions or measures rather than as symmetric functions.) 

Now fix A: > 1, —2 < u < 2, and the atom distributions and consider the 

(k) 

limiting behaviour of the normalised /c-point correlation functions p n 'u as n — > oo. 
A basic conjecture in the subject, due to Wigner, Dyson, and Mehta, can be stated 
informally as follows: 

Conjecture 4 (Wigncr-Dyson-Mehta bulk universality conjecture, informal ver- 

m ~ (k) (k) 

sion). [27, 9] Fix k > 1. —2 < u < 2, and atom distributions Then p n ' u — > p si ^ c 

as n — > oo, where the Dyson sine kernel fc-point correlation functions Pg^ c : R' — > 
R + are defined by the formula 

(3) PstL^i' •••>**) : = det(Ar sinc (ti,^))i<ij<A ; 

and Ksinc{t, t') := ^j^t^St)^ is £/ie Dyson sine kernel. 
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This conjecture is imprecise because the nature of the convergence of the function 
(or measure) p\i} u to p^ nc is not specified. There are of course infinitely many 
modes of convergence that one could consider. A mode of convergence which has 
drawn considerable attention is vague convergence: 

• (Vague convergence) For any continuous, compactly supported function 
F : R k -> R + , one has 



In recent years, attention has also been paid to a weaker form of convergence: 

• (Average vague convergence) For any continuous, compactly supported 
function F : R fc — > R + , one has 



The first objective of this paper is to provide an almost complete solution for 
vague convergence (Theorem 5). Namely, in this paper we establish the following 
theory, which asserts that universality holds in the vague convergence sense under 
the assumption that the atom variables have bounded high moments. 

Theorem 5 (Bulk universality for vague convergence). The Wigner-Dyson-Mehta 
conjecture for vague convergence is true whenever the atom distributions £, £ have 
finite Cq H moment, thus ~E\l;\ Co ,~E\!;\ Co < oo, for some sufficiently large absolute 
constant Co . 

Under mild assumptions (such as convergence which is locally uniform in the u 
parameter), vague convergence also implies averaged vague convergence. However, 
this implication cannot be automatically reversed unless one has some uniform 
control on the regularity of the p n<u or p n ,u', such as equicontinuity. We also 
observe that for vague convergence that one can restrict attention without loss of 
generality to smooth compactly supported functions F, thanks to the Weierstrass 
approximation theorem. 

Having established the Wigncr-Dyson-Mehta conjecture in this form, we would 
like to raise new challenges by considering other (stronger) modes of convergence. 
We focus on the following three modes (in decreasing order of strength). 

(1) (Local uniform convergence) For every compact subset K of R fc , one has 






one has 
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(3) (Weak convergence) For any L°°, compactly supported function F : R fc — > 
R + , one has 

lim [ F(t)pW(t) dt= f F(t)p$i e (t) dt. 

These three stronger notions of convergence, namely local uniform convergence, 
local L 1 convergence, and weak convergence, are only natural for continuous Wigner 
ensembles, since for discrete Wigner ensembles, pnju is a discrete probability mea- 
sure and is thus supported on a set of Lebesgue measure zero. 

We may now pose the question of determining the atom distributions £, £ for which 
the Wigner-Dyson-Mehta conjecture is true for local uniform convergence (resp. 
weak convergence). By examining carefully and slightly modifying an existing proof 
in current literature [11] we note the following partial result. 

Theorem 6 (Bulk universality for local L 1 convergence). For k > 1, there exists an 
integer J/- > 1 such that the following statements hold. Assume that the atom dis- 
tributions £, £ have a continuous distribution e~ v ^ e~ x dx, e~ v ^e~ x dx obeying 
the bounds 

and 

\—v(x)\<c(i+x*r 

for some C, 5, m > and all 1 < j < J/~. Then the Wigner-Dyson-Mehta conjecture 
holds for these atom distributions and this value of k in the local L 1 sense. 

If k = 2, then one can take Jk = 6. 

Thus, if one wants to obtain the Wigner-Dyson-Mehta conjecture in the local L 1 
sense for a fixed value of k, one only needs a finite number of regularity hypotheses 
on the distribution; but if one wants to use this theorem to obtain local L 1 con- 
vergence for all fc, one needs an infinite number of such hypotheses. This is likely 
an artefact of the proof method, however, and one should be able to get local L 1 
convergence for all k assuming only a finite amount of regularity. 

2.1. Prior results. Theorem 5 is the last step of a long series of results, which we 
now survey. We will focus exclusively on the bulk case — 2 < u < 2; there are also 
several analogous results in the edge case u = ±2 (see [30], [31], [32], [34], [24], [19]) 
which we do not discuss here. 

The first results on Wigner-Dyson-Mehta conjecture were for the GUE ensemble. 
In this case, one has the explicit Gaudin-Mehta formula 

R<£\xi,. ..,x k )= det(K n (xi,Xj))i<i t j< k 
for any k > 1, where the kernel K n (x,y) is given by the formula 

n-l 

K n (x,y) := Vn~Y, Pk{^x)e~ n ^ l 2 P k {^y)e-^ ' 2 

k=0 
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and Pq, . . . , P n -i are the first n Hermite polynomials, normalized so that each Pi is 
of degree i, and are orthonormal with respect to the measure e~ x / 2 dx; see [20, 27]. 
We may renormalize the Gaudin-Mchta formula as 

p£l(ti,. . . , tk) = det(K n ,u[U, tj))i<i,j<k 
for any k > 1 and —2 < u < 2. where 

, i . i T ( t a \ 

K n}U {t,t ) : = TT^« u_l TT' W_I TT ■ 

np BC (u) \ np sc (u) np sc (u) J 

A standard calculation using the Plancherel-Rotarch asymptotics of Hermite poly- 
nomials and the Christoffel-Darboux formula (see [8]) shows that K njU converges 
locally uniformly to -Ksinc as n — > oo, where —2 < u < 2 is fixed. As such, the 
Wigner-Dyson-Mehta conjecture is true in the locally uniform sense, and thus also 
in the local L , weak, vague, and averaged vague senses; see [1, Lemma 3.5.1] for a 
proof. 

Analogous results are known for much wider classes of invariant random matrix 
ensembles, see e.g. [7], [29], [4]. However, we will not discuss these results further 
here, as they do not directly impact on the case of Wigner ensembles. 

Returning to the Wigner case, the next major breakthrough was by Johansson 
[23], who considered atom distributions £,£ that were gauss divisible in the sense 
that they could be expressed as 

£ = e"*/ 2 6 + (1 - e - 4 ) 1/2 6; f = e-^l + (1 - e"*) 1 / 2 ^, 

where t > 0, were real random variables of mean zero and variance one, and 

£2,^2 = N(0, 1) were gaussian random variables that were independent of 
respectively. Equivalently, the distributions of are obtained from those of 
by applying the Ornstcin-Uhlcnbcck process for time t, and the distributions of the 
Wigner matrix M n are similarly obtained from an initial Wigner ensemble M n ^\ 
by a matrix version of the Ornstcin-Uhlcnbcck process, with the eigenvalues then 
evolving by the process of Dyson Brownian motion. Note that gauss divisible dis- 
tributions are automatically continuous (and even smooth), and so it makes sense 
to talk about convergence in the weak or locally uniform senses in this setting. The 
result of [23] is then that if the Ornstein-Uhlenbeck time t is fixed and positive 
independently of n, and if the factor distributions £1, £1 have sufficiently many mo- 
ments finite, then the Wigner-Dyson-Mehta conjecture is true in the weak sense. 
Informally, the results of [23] assert that the renormalized fc-point correlation func- 
tions ph} u converge to equilibrium by time t for any t > independent of n. The 
main tool used in [23] was an explicit determinantal formula for the correlation 
functions in the gauss divisible case, essentially due to Brezin and Hikami [5]. 

In Johansson's result, the time parameter t > had to be independent of n. It was 
realized by Erdos, Ramirez, Schlein, and Yau that one could obtain many further 
cases of the Wigner-Dyson-Mehta conjecture if one could extend Johansson's result 
to much shorter times t that decayed at a polynomial rate in n. This was first 
achieved (again in the context of weak convergence) for t > rt~ 3 / 4+£ for an arbitrary 
fixed e > in [10], and then to the essentially optimal case t > n~ 1+£ (for weak 
convergence) in [11], assuming that the atom distributions £,£ were continuous 
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whose distribution was sufficiently smooth distribution (e.g. when k = 2 one needs 
a C 6 type condition), and also decayed exponentially. The methods used in [11] 
were an extension of those in [23] , combined with an approximation argument (the 
"method of time reversal" ) that approximated a continuous distribution by a gauss 
divisible one (with a small value of t); the arguments in [10] are based instead on 
an analysis of the Dyson Brownian motion. 

Simultaneously with results in [11], the authors in [33] introduced a different ap- 
proach, based on what we call the Four Moment Theorem, that allowed one to 
extend claims such as those in the Wigner-Mehta-Dyson conjecture (in the context 
of vague convergence) from one choice of atom distributions to another, provided 
that the first four moments in the atom distribution £ were matching (or close to 
matching), and provided that the atom distributions £,£ obeyed an exponential 
decay condition, namely that 

(4) P(|£l>i),P(iei>i)<«- c 

for all t > and some constants C, c > 0. However, the distributions £, £ were not 
required to be continuous, and in particular could be discrete. By combining the 
Four Moment Theorem with the results of Johansson (which provided a rich class of 
comparison distributions with which to match moments), the Wigner-Mehta-Dyson 
conjecture for vague convergence was then established in [33] for atom distributions 
£ that were supported on at least three points and whose third moment E£ 3 van- 
ished, assuming the exponential decay condition (4) on £ and £. The vanishing 
third moment condition was needed in order to obtain a strong localization result 
for the individual eigenvalues Aj(M„) of the Wigner matrix M n . The three point 
condition is a technical condition needed to in order to match moments with a 
better behaved ensemble, but is quite mild, as it effectively only excludes the case 
of Bernoulli random ensembles. 

Shortly afterwards, it was realized in [12] that the methods in [11] and [33] could be 
combined to handle a wider class of atom distributions, but at the cost of weakening 
vague convergence to averaged vague convergence. Specifically, in [12] the Wigner- 
Mehta-Dyson conjecture for averaged vague convergence was established for atom 
distributions £, £' that were assumed to have an exponential decay condition (4) 
, but for which no regularity, support, or moment conditions were imposed. The 
need to retreat to averaged vague convergence was again due to the lack (at the 
time) of a strong localization result of individual eigenvalues for this ensemble; in 
particular, as remarked in [12] one could upgrade averaged vague convergence to 
vague convergence if one re-imposed the vanishing third moment condition. 

Next, in [15], the method of local relaxation flow was introduced, which provided 
a new way to analyze Dyson Brownian motion on short time scales that did not 
rely on explicit formulae of Brezin-Hikami type. This gave a simpler and more 
general approach to universality, but for technical reasons it relied more heavily on 
the ability to average in the energy parameter u, and so could only give progress on 
the Wigner-Mehta-Dyson conjecture in the context of averaged vague convergence. 
In particular, in [15] an alternate proof of the Wigner-Mehta-Dyson conjecture for 
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averaged vague convergence was established 1 for atom distributions £ supported on 
at least three points and with £, £ obeying an exponential decay condition (4) (with 
the method extending for the first time to real symmetric or symplectic ensembles as 
well) , with the three point condition then being removed subsequently in [18] . These 
results were generalized to covariance matrices in [3], [16] and to generalised Wigner 
ensembles (in which the entries need not be iid and are allowed some fluctuation in 
their variances) in [17], [18]. 

In [35] it was observed that the exponential decay condition (4) on the atom 
distributions in the Four Moment Theorem could be relaxed to a finite C^ 1 moment 
condition for some sufficiently large absolute constant Co (e.g. Co = 10 4 would 
suffice). As a consequence, in many of the preceding results on the Wigner-Dyson- 
Mehta conjecture, the exponential decay hypothesis (4) could be replaced with a 
finite moment hypothesis. 

In the very recent paper [26] (see in particular Corollary 1.3 of that paper), the 
Wigner-Dyson-Mehta conjecture for the strongest notion of convergence, namely 
local uniform convergence, was established in the k = 1 case, assuming a sufficient 
number of smoothness and moment conditions on the atom distribution. 

For more detailed discussion of the above results, see the surveys [9], [21]. 

2.2. Application to the counting function. Theorem 5 implies in particular a 
moment bound for the counting function 



on intervals in the bulk at the scale of the mean eigenvalue spacing: 

Corollary 7. Suppose the atom distributions obey the bounds E | ^ | c ' , E | ^ | C7 ° < C\ 
for some sufficiently large Co > and some C\ > 0. Let e > 0, and let I C 
[—2 + s, 2 — e] be an interval of length at most K/n for some K > 0. Then for any 



where the implied constant depends only on Co, Ci, e, K, k. 

Such an estimate was previously established in [14] under an additional regularity 
hypothesis on the atom distribution, as well as a stronger (subgaussian) decay 
hypothesis. 

Proof. Fix Co,Ci,e,K,k, and allow all implied constants to depend on these param- 
eters. In view of the trivial bound Ni < n we may assume that n is sufficiently large 



The main theorem in [15] assumes a log-Sobolev condition on the atom distribution, but it is 
remarked in that paper that this condition can be removed assuming the three points condition, 
thanks to the Four Moment Theorem. 

2 Hcrc we use the asymptotic notation X -C Y or X = 0(Y) if |X| < CY for some constant C 
depending on the indicated parameters. 



Ni := #{1 < i < n : X t {M n ) e /} 




EN? < 1 
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depending on the fixed parameters. We may then take I = [u — K/2n, u + K/2n] 
for some -2 + e/2 < u < 2 - e/2. 

It suffices to show that 




Using (1), (2), we can bound 

where F : R fe — > R + is a smooth, compactly supported function that equals 1 on 
[-K/2, K/2] k . From Theorem 5 we have 

| / F{t)pW(t)dt- ( F(t)pM e ( t )dt\^0 

jR k JR k 

as n — > oo, and so the left-hand side is bounded in n. An inspection of the proof 
of Theorem 5 shows that this bound only depends 3 on the quantities Co,Ci,e,K,k 
(i.e. it is uniform in u and in £, £ once the quantities Co, Ci, e, K, k are fixed); thus 

\f F(t)pW(t)dt- [ F(t)p^Jt)dt\<^l. 

jR k JR k 

As psinc is bounded, the claim follows. □ 

With a little more effort, one can obtain the asymptotic law for Nj in the case 
when |/| is comparable to 1/n: 

Theorem 8 (Asymptotic for iVj). Suppose the atom distributions obey the bounds 
E\£\ Co , E|f| Co < C\ for some sufficiently large C > and some C\ > 0. Let 
e > and K > be independent of n. For any n, let u = u n be an element of 
[— 2+£,2 — e], and let I = I n be the interval I := [u, u + p ^ u j n ]- Then Nj converges 
in distribution to the random variable £j> where £j are independent Bernoulli 

indicator random variables with expectation E^j = pj , where p\ > pi > . . . > are 
the eigenvalues of the (compact, positive semi-definite) integral operator T f (x) :— 
I[o K] ^Sine(^j y)f{y) dy on L 2 ([0,K]). In particular, the probability P(7Vj = 0) 
that I has no eigenvalues is equal to Yi'jLiO- ~ Pj) + 

This result is well known for GUE, thanks to the theory of determinantal processes, 
but the extension to arbitrary Wigner matrices (with the finite moment condition) 
is new. We prove this result in Section 5. 

By definition, the quantity rijli(l — Pj) appearing in Theorem 8 is equal to the 
Fredholm determinant det(l— T). This determinant was computed by Jimbo, Miwa, 
Mori and Sato (see [22] or [1, Theorem 3.1.2]) as the solution to a certain ODE in 
the length parameter K. As a consequence, we have 



For the purposes of establishing Theorem 8 below, this more refined version of Theorem 5 is 
not necessary, as one can use the weaker conclusion limsup^^^ EiVj < oo instead as a substitute 
for (5). 
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P(iVj = 0) -> exp( / 
./o 



Corollary 9. With the notation and assumptions of Theorem 8, one has 

'"fit) 

m 

as n — > oo, where / : R — > R is £/ie solution of the differential equation 

{Kf'f + A{Kf - f)(Kf -f+ (/') 2 ) = 
toiift tfie asymptotics f(K) = =£■ - ^ - ^ + O^ 4 ) as -> 0. 

It is likely that one can also obtain asymptotic distributions for iVj for longer 
intervals /, as in [6], but we do not pursue this issue here. 

We thank the anonymous referees for many useful corrections and suggestions. 



3. Proof of Theorem 5 



In this section we prove Theorem 5. This proof is a simple combination of existing 
arguments and results which have been obtained in the last few years. The core of 
our argument is the following. In [33], the authors worked out a method to prove 
universality using the Four moment theorem combined with Johansson's theorem. 
A finer version of this theorem from [12] enables one to combine an approximate 
version of the Four moment theorem with a version of Johansson's theorem where 
the time parameter t tends to zero with n. This gives universality under the as- 
sumption that the third moment vanishes. The extra observation here is that we 
can omit this assumption using a recent rigidity result from [19]. Details now follow. 

Wc need the following technical definition. 

Definition 10 (Asymptotic moment matching). Let <$w = (Si, ... ,5k) be a se- 
quence of k positive numbers. We say that two complex random variables £ and £' 
£W -match to order k if 

ERe(C) m Im(C)' - ERe(O m Im(C')' | < <Wi 

for all m, I > such that m + I < k. 



Set 

6^ := (OAn-^-^n- 1 / 2 -*) 
where c is a positive constant. (The first two coordinates are as we would like to 
keep the mean and variance 1 in all models.) 

The approximate version of the Four moment theorem is the following (implicit 
in [12] and can be deduced easily from the proof of the Four moment theorem): 

Theorem 11 (Asymptotic Four Moment Theorem). There is a small absolute 
constant cq > such that for integer k > 1 the following holds. Let M n = 
-^(dj)i<i.j<n and = ■^(dj)i<i,j<n oe t wo Wigher Hermitian matrices where 
the atom distributions have finite Cq 11 moment for some sufficiently large Co. As- 
sume furthermore that for any 1 < i < j < n, Qj and £• ■ 5^ -match to order 4 and 
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for any 1 < i < n, Ca an d Cu match to order 2. Set A n := nM n and A' n := nM' n , 
and let G : R fc — > R be a smooth function obeying the derivative bounds 

(6) \V' J G{x)\<n Ca 

for all < j < 5 and x G R fc . Then for any 1 < i\ < «2 • • • < i k < n > an d for n 
sufficiently large we have 

(7) |E(G(A n (A n ), ...,\ ik (A n ))) - E(G(A n «), . . . , X ik (A' n )))\ < n~ c " . 

Remark 12. In the version of this theorem in [33], [12] one required the atom 
distributions of both M n and M' n to obey an exponential decay condition (4), 
rather than merely having a finite Cg h moment. However, the only reason why this 
exponential decay was ne eded was to obtain a lower tail estimate for eigenvalue 
gaps (see [33, Theorem 19]). However, it was subsequently observed in [35, Remark 
38] that (by using a certain truncated version of the Four Moment Theorem) one 
could extend this lower tail estimate to the case of Wigner matrices whose atom 
distribution had finite Cg h moment, and so Theorem 11 can also be extended to 
this regime. 



We fix as in that theorem, fix a k > 1, and fix a continuous compactly 

supported function F : R fc — > R; we will also need a small constant e > depending 
on k (e :— will suffice). We allow all implied constants in asymptotic notation 
to depend on these quantities. It then suffices to show that 

(8) / FpW(t 1 ,...,t k )dt 1 ...dt k = [ FpM e (t 1 ,...,t k )dt 1 ...t k +o(l). 



Using the Stone- Weierstrass theorem to approximate a continuous compactly sup- 
ported function uniformly by a smooth function of uniformly bounded support, we 
may assume without loss of generality that F is smooth. (The error in doing so 
can be upper bounded by a further application of (8) applied to a slightly wider 
function F, taking advantage of the local integrability of Pg^ c -) 

We may assume that n is sufficiently large depending on these quantities. We also 
need a small absolute constant e > (e := 10~ 2 will suffice). The distribution £ 
need not be bounded, but it is easy to see (e.g. using [33, Lemma 28]) that there 
exists a bounded distribution £' which matches moments with £ to fourth order in 
the sense that E£ J = E(£') 1 for i = 1, 2, 3,4. Next, we set t := n~ 1+e and introduce 
the modified atom distribution £" defined by the formula 

i" := e"*/V + (1 - e-') 1/2 S 
where g = N(Q, 1) is independent of A routine computation shows that 

(9) E(£") 1 = Ef 
for i = 1, 2 and 

(10) E(£")' 1 = E£* + 0(n~ 1+e ) 

for i = 3,4; thus £ and £" have approximately matching moments to fourth order. 
We define £' and £" from £ analogously. 
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Let M% be the random matrix ensemble defined similarly to M n , but with the 
atom distributions £, £ replaced by This is another Wigncr ensemble whose 

atom distribution is now gauss divisible with time parameter t. and which also obeys 
an exponential decay condition (4) (being the sum of a bounded distribution and 
a gaussian distribution). As such, one can invoke [12, Proposition 4] and conclude 
that M% already obeys the required conclusion, thus 

/ F{pW)"(t 1 ,...,t k )dt 1 ...dt k = f Fp^ Le (t 1 ,...,t k )dt 1 ...t k + o{l), 

where {pn}u)" is defined analogously to pn}u but with M n replaced by It thus 
suffices to show that 

F(pW )"(ti,..., t k ) dh...dt k = f FpW (ti,... ,t k ) dh...t k + o(l), 

or cquivalently (after a rcscaling) that 

]T EG(nA il (M n ),...,nA <fc (M n )) 

l<ii < . ..<ifc <n 

J2 EG(nA il (M«),...,nA ifc (M«)) + o(l) 

l<ii <...<i fc <n 

where G : R fc — >• R is the function 

G(h,. ..,t k ) := F(p sc (u)(ti - nu), . . .,p sc (u)(t k - mi)). 

Note that the expression G(y/nXi 1 (M n ), . . . , \/n\i k (M„)) is only non-zero when we 
have 

(11) A n (M B ), . . . , X ik (M„) = u + o(- 



Using the crude upper bound from [33, Proposition 66], we see that with prob- 
ability 1 — o(n~ k ), the number of eigenvalues Aj(M n ) or \i{M' n ) that lie in the 
range (11) is 0{n°^). The exceptional event of probability o(n~ k ) contributes 
at most o(l) to the expression to be estimated and can thus be ignored. In the 
remaining event, we see that the sum J2i<i 1< ...<t k < n (Mn), • ■ • , n\ ik (M„)) 

is at most 0(n W), and similarly for M' n . Thus we may in fact discard any event 
of probability 0(n~ c ) for any c > 0. 

We now need the following rigidity of eigenvalues theorem: 

Theorem 13 (Rigidity of eigenvalues). I/O < e,K < 1 are independent of n, and 
—2 + k < u < 2 — k, then for sufficiently large n, one has 

N { - 2 ,u\(M n ) = J Psc (x) dx + 0(n e ) 
with probability 1 — 0(n~ c ) for some absolute constant c > 0. 



Proof. If the Wigner matrix has exponential decay, then this follows from [19, 
Theorem 2.2] or [18, Theorem 6.3] (and in this case one obtains a much higher 
probability of success, in particular obtaining 1 — 0(n~ A ) for any A). The general 
case then follows from the Four Moment Theorem. □ 
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Applying this theorem (with u replaced by u ± n 1+2e , say), we see that with 
probability 1 — 0(n~ c ), the event (11) only occurs when 

(12) ix,...,i k =nj p sc (x) dx + 0(n £ ). 

By the preceding discussion we may discard the exceptional event of probability 
0(n~ c ) in which the above assertion fails. We thus see that up to errors of o(l), we 
may localize all the indices ii, . . . ,ik to the regime (12). By the triangle inequality, 
it thus suffices to show that 

EG(VEX h (M n ), . . . , (M„)) = ~EG(\fn\ il (M%), V^K k (AO) + 0(n~ Ca ) 

for all 1 < i\ < . . . < ik < n and some absolute constant cq > independent of s. 
But this follows from Theorem 11. 



4. Proof of Theorem 6 



The proof of this theorem is based on a careful inspection of the proof of the 
weak convergence result in [11]. From [11, Theorem 1.1] (in the k = 2 case) and 
[11, Remark 1.1] (for the general k case) we obtain the weak convergence for all 
functions F which are bounded and of compact support; thus we have 

(13) I / F(t)pM(t) dt- f F(t)p£l(t) dt\ < o(l) 

whenever F is bounded and supported in a compact set K . 

Let us now inspect the bounds on the convergence rate o(l) in (13) that come from 
the argument in [11, §4]. That argument controls the left-hand side of (13) by two 
expressions, denoted (/) and (//) in [11, §4]. The first error term (I) is shown to 
decay exponentially in n, and the dependence on F only appears through a factor 
of ||_F||ioo. The error term (II) is bounded using [11, Proposition 3.1], which in our 
notation is an estimate of the form 

(14) \f F(t)(pW)'(t)dt- [ F(t)pi\l(t)dt\<o(l) 

where (pn}u)' is the fc-point correlation function corresponding to a certain gauss 
divisible Wigner matrix. 

The bound (14) in turn proven using [11, Proposition 3.3]. The convergence of 
the fc-point correlation function provided by [11, Proposition 3.3] is uniform on 
bounded sets. If one uses this uniform convergence in the argument used to prove 
[11, Proposition 3.1], we see that the upper bound on (14) is actually of the form 
c^Hi^Hoo, where c(n) — > as n — ¥ oo depends on the support K of F, but is 
otherwise independent of F. Returning to the estimation of the term (77) in [11, 
§4], we conclude a similar bound for (II). Putting all this together, we obtain a 
bound of the form 

l/ F(t)pW(t)dt- [ nt)p^(*) *l<C(n)Moo 

JR k JR k 



14 



TERENCE TAO AND VAN VU 



where c'(n) — > as n — > oo depends on A but is otherwise independent of F. By 
duality, this implies that 

/ \pi k i(t) dt-pW M dt<c'(n) 

JK 

which gives the local L 1 convergence. 

Remark 14. A similar inspection the proof of [23, Theorem 1.2] (and [23, Lemma 
3.1]) reveals that the weak convergence result in [23] could in fact also be retroac- 
tively upgraded to local L 1 convergence in a similar fashion. It is likely that one can 
reduce the regularity and decay hypotheses on the above theorem, for instance by 
using the methods indicated in [11, Section 5]. However, some minimal regularity 
hypothesis is certainly needed, as local L 1 convergence is of course not possible in 
the case of discrete distributions. It is also likely that one can upgrade local L 1 
convergence further to local uniform convergence under a sufficiently strong regu- 
larity hypothesis, especially in view of [11, Proposition 3.3] (and also [26, Corollary 
1.3] for the k = 1 case). 



5. Proof of Theorem 8 



We now prove Theorem 8. Fix e,K,Co,Cx; we allow all implied constants to 
depend on these quantities. From the trace formula 

OO p 

y^Pj = / Ksme(x,x) dx = K 

we see that the pj are absolutely summable. In fact, we have a stronger decay 
property: 

Lemma 15 (Decay faster than any polynomial). Let A > 0. Then one has pj <Ca 
j A f or 3 ■ 



Proof. The Dyson kernel K$- lne is smooth on [0, A"] x [0, A"], and can thus be 
smoothly extended to a function on the torus (R/2AZ) x (R/2A"Z). By a Fourier 
expansion, one can thus approximate K§- mfi (x, y) uniformly to error Oa(M~ 2A ) by 
a Fourier series 

M M 

C a be ^ax/2K e 27iiby/2K ^ 

a=-M b=-M 

for any positive integer M; thus 

M M 

A sino (x,y)= E c a , b e^^ K e^ b y/ 2K + A (M~ 2A ) 

a=-M b=-M 

for all x, y £ [0, K\. This decomposition of the kernel Asi nc induces a corresponding 
decomposition of the integral operator 

Tf{x)= K Sinc {x,y)f(y) dy 

J[0,K] 
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as T = Ti + T 2 , where 

M M 



= E E c Q , b e 2 — / e 2 ^/ 2 */(y) 

and T2 is an operator with operator norm Oa{M~ 2A ). 

Observe that T\ is a finite rank operator, with rank at most (2M + l) 2 . By 
the Courant-Fisher min-max theorem, we thus see that apart from the (2 M + l) 2 
largest eigenvalues, all other eigenvalues of T are of size Oa(M~ 2A ). In other words, 
Pj = Oa(M~ 2A ) whenever j > (2M + l) 2 , and the claim follows. □ 



This implies a subgaussian bound on Y^jLi £j : 
Lemma 16. For any A > 0, one has 

P(E0>A)«e 

i=i 

for some constant c > independent of n. 



-c\ A 



Proof. We may assume without loss of generality that A > 1. Because each £j is 
bounded by 1, we have 

00 

P(E& > A ) < p ( E &>A/2). 

3=1 j>A/2 

From Lemma 15, the random variable Ylj>\/2 £7 nas mean an d variance Oa{^ A ) 
for any ^4 > 0. The claim then follows from the Chernoff inequality. □ 

As a consequence of the above lemma and the Carleman theorem (see e.g. [2]), 
the distribution of £7 is determined uniquely by its moments E(^°^ 1 £j) fc 

for k = 1, 2, To prove Theorem 8, it thus suffices (by Prokhorov's theorem) to 

show that 

k 

for fe = 1, 2, As the monomial n fc can be expressed as a linear combination of 

the binomial coefficients (™) for j = 1, . . . , k, it suffices to show that 

n->oo \ k J \ k 

for fe = 1, 2, 

By (1), (2), one has 
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whenever F is a continuous compactly supported function with F > l[o,K] k point- 
wise, and similarly 



whenever F is a continuous compactly supported function with F < l\o,K] k point- 
wise. Taking limits using Theorem 5, and then taking advantage of the bounded 
nature of Pg^ c (and Urysohn's lemma), we conclude that 

V * J k] - J[0,K\* 



Meanwhile we have 



n— >oo 

4 



3i,...jfc>i, distinct 

so it suffices to show that 



/ /4aL(*) d* = E 



^°' K ^ ji,—,jk>h distinct 

By the spectral theorem, we may write 

oo 

K sinc (x,y) = ^2p ] cj) j (x)(j) j (y) 

3=1 

for some orthonormal real sequence <j>j £ L 2 ([0, K]), and so (by (3) and the multi- 
linearity of determinant) 

PSine(*l' •••>**)= E PA ■ • det (fe(*a)fe(*6))l<o,6<fe- 
31, ...,3fc>l 

It thus suffices to show that the integral 

(15) / det(0 Ja (t a )4>j a {tb))i< a ,b<k dti... dt k 

J[0,K] k 

equals 1 when the ji, ■ ■ ■ ,jk are distinct, and vanishes otherwise. 

If two of the j a are equal, then we see that two of the rows in the determinant are 
linearly dependent, so (15) indeed vanishes. Now suppose that the ji, . . . ,ju are all 
distinct. By cofactor expansion, we may then write (15) as 

k 

tbjfa^ih) dt b . 

By the orthonormality of the <pj and Fubini's theorem, the integral here equals 1 
when a is the identity permutation and vanishes otherwise. The claim follows. 



f Ucf> jb (t b 

aeS k 6=1 



Remark 17. An inspection of the above argument reveals that one can replace 
the interval [0,-K] by an arbitrary compact set A, with the interval / then being 
replaced by the set {u + - : t £ A}. We leave the details to the interested 
reader. 



^Note that all formal interchanges of summation or integration in this argument can be easily 
justified using Lemma 15. 
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